{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "## Prerequisites:\n",
    "\n",
    "Before running this notebook, make sure you have gone through the steps listed below: \n",
    "\n",
    " - You have a workspace created https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-get-started\n",
    " <br>\n",
    " - You have a development environment configured https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-environment"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%load_ext autoreload\n",
    "%autoreload 2\n",
    "\n",
    "import os\n",
    "from azureml.core import  (Workspace,Run,VERSION, \n",
    "                           Experiment,Datastore)\n",
    "from azureml.core.runconfig import (RunConfiguration,\n",
    "                                    DEFAULT_GPU_IMAGE)\n",
    "from azureml.core.conda_dependencies import CondaDependencies\n",
    "from azureml.core.compute import (AmlCompute, ComputeTarget)\n",
    "from azureml.exceptions import ComputeTargetException\n",
    "from azureml.data.data_reference import DataReference\n",
    "from azureml.pipeline.core import (Pipeline, \n",
    "                                   PipelineData)\n",
    "from azureml.pipeline.steps import PythonScriptStep\n",
    "from azureml.widgets import RunDetails\n",
    "import pandas as pd\n",
    "\n",
    "\n",
    "print(\"AML SDK version :\", VERSION)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "subscription_id = ''\n",
    "resource_group = ''\n",
    "workspace_name = ''"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "project_folder = os.getcwd()\n",
    "exp_name = \"facereco\"\n",
    "ws = Workspace(workspace_name = workspace_name,\n",
    "               subscription_id = subscription_id,\n",
    "               resource_group = resource_group)\n",
    "\n",
    "ws.write_config()\n",
    "print('Workspace loaded:', ws.name)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data Store\n",
    "\n",
    "Whilst the preprocessed dataset have been made available, you can download from [here](https://amlgitsamples.blob.core.windows.net/facereco/fgnet.zip), upload it over to your Azure blob storage account and point to it in the cell below\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "account_name = \"amlgitsamples\"\n",
    "container_name = \"facereco\"\n",
    "datastore_name = 'fgnet'\n",
    "datastore = Datastore.register_azure_blob_container(workspace = ws, \n",
    "                                        datastore_name = datastore_name, \n",
    "                                        container_name = container_name,\n",
    "                                        account_name = account_name, \n",
    "                                        overwrite = True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Compute target \n",
    "\n",
    "Here we choose to execute the pipeline on Batch AI, but you can easily swap the compute target to other [supported types](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-ml-pipelines#key-advantages)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "code_folding": []
   },
   "outputs": [],
   "source": [
    "cluster_name = \"gpu-cluster\"\n",
    "\n",
    "try:\n",
    "    cluster = ComputeTarget(ws, cluster_name)\n",
    "    print(cluster_name, \"found\")\n",
    "    \n",
    "except ComputeTargetException:\n",
    "    print(cluster_name, \"not found, provisioning....\")\n",
    "    provisioning_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',max_nodes=1)\n",
    "\n",
    "    \n",
    "    cluster = ComputeTarget.create(ws, cluster_name, provisioning_config)\n",
    "    cluster.wait_for_completion(show_output=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Run configuration\n",
    "\n",
    "\n",
    "Here, we define the conda environment along with the packages dependencies needed by our training scripts along with the [run configuration](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-azure-machine-learning-architecture#run-configuration)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cd = CondaDependencies()\n",
    "cd.add_conda_package('pandas')\n",
    "cd.add_channel(channel = 'menpo')\n",
    "cd.add_conda_package('matplotlib')\n",
    "#cd.add_conda_package('opencv')\n",
    "cd.add_conda_package('scikit-learn')\n",
    "\n",
    "cd.add_pip_package('tensorflow-gpu==1.14')\n",
    "cd.add_pip_package('keras==2.2.4')\n",
    "cd.add_pip_package('keras-vggface')\n",
    "cd.add_pip_package('opencv-python')\n",
    "\n",
    "\n",
    "run_config = RunConfiguration(framework=\"python\",\n",
    "                              conda_dependencies= cd)\n",
    "run_config.target = cluster\n",
    "run_config.environment.docker.enabled = True\n",
    "run_config.environment.docker.base_image = DEFAULT_GPU_IMAGE\n",
    "run_config.environment.python.user_managed_dependencies = False"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Pipeline data input/output\n",
    "\n",
    "We define a reference to the data store we registered earlier that point to the storage which contains the images.\n",
    "\n",
    "Note that the pipelineData objects uses the default the data store of the workspace."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "code_folding": []
   },
   "outputs": [],
   "source": [
    "images_dir = DataReference(data_reference_name = 'images', \n",
    "                             path_on_datastore = 'fgnet', \n",
    "                             mode =\"download\", \n",
    "                             datastore = datastore\n",
    "                          )\n",
    "default_datastore=ws.datastores[\"workspaceblobstore\"]\n",
    "\n",
    "metadata_dir = PipelineData(name = 'outputs')\n",
    "vggface_dir = PipelineData(name = 'vggface')\n",
    "pca_dir = PipelineData(name = 'pca')\n",
    "clf_dir = PipelineData(name = 'outputs')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Pipeline steps\n",
    " \n",
    "\n",
    "Below are the four steps files that makes up the pipeline. For detailed description of all steps, refer to the readme file [here](https://github.com/Azure/AMLSamples/blob/master/facereco/readme.md).\n",
    "    \n",
    "   - Step 1 metadata processing [file](./preprocess.py)\n",
    "   - Step 2 VGG-Face features extraction [file](./vggface_features.py)\n",
    "   - Step 3 Dimensionality reduction [file](./pca.py)\n",
    "   - Step 4 Classifier training [file](./classifier.py)\n",
    "   \n",
    "Next, we declare the steps that makes up the pipeline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "code_folding": [],
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "metadata_processing = PythonScriptStep(\n",
    "                            name = 'process images metadata',\n",
    "                            script_name = 'preprocess.py',\n",
    "                            arguments = ['--images_dir', images_dir,\\\n",
    "                                         '--metadata_path', metadata_dir],\n",
    "                            inputs = [images_dir],\n",
    "                            outputs = [metadata_dir],\n",
    "                            compute_target = cluster_name,\n",
    "                            runconfig = run_config\n",
    "                        )\n",
    "\n",
    "\n",
    "vggface_features = PythonScriptStep(\n",
    "                            name = 'VGG-face features extractor',\n",
    "                            script_name = 'vggface_features.py',\n",
    "                            arguments = ['--metadata_path', metadata_dir,\\\n",
    "                                         '--images_dir', images_dir,\\\n",
    "                                        '--vggface_path', vggface_dir],\n",
    "                            inputs = [metadata_dir, images_dir],\n",
    "                            outputs = [vggface_dir],\n",
    "                            compute_target = cluster_name,\n",
    "                            runconfig = run_config\n",
    "                        )\n",
    "\n",
    "pca_features = PythonScriptStep(\n",
    "                            name = 'PCA features extractor',\n",
    "                            script_name = 'pca.py',\n",
    "                            arguments = ['--vggface_path', vggface_dir,\\\n",
    "                                        '--pca_path', pca_dir],\n",
    "                            inputs = [vggface_dir],\n",
    "                            outputs = [pca_dir],\n",
    "                            compute_target = cluster_name,\n",
    "                            runconfig = run_config\n",
    "                        )\n",
    "\n",
    "classifier_step = PythonScriptStep(\n",
    "                            name = 'Fit classifier',\n",
    "                            script_name = 'classifier.py',\n",
    "                            arguments = ['--vggface_path', vggface_dir,\\\n",
    "                                        '--pca_path', pca_dir,\\\n",
    "                                        '--clf_path', clf_dir],\n",
    "                            inputs = [vggface_dir, pca_dir],\n",
    "                            outputs = [clf_dir],\n",
    "                            compute_target = cluster_name,\n",
    "                            runconfig = run_config\n",
    "                        )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Pipeline execution\n",
    "\n",
    "Finally we put it all together, construct an experiment and train the pipeline."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pipeline = Pipeline(default_datastore=ws.datastores[\"workspaceblobstore\"],\n",
    "                description = 'face recognition pipeline', \n",
    "                default_source_directory = project_folder,\n",
    "                workspace = ws,\n",
    "                steps = [classifier_step]\n",
    "                   )\n",
    "\n",
    "pipeline_run = Experiment(workspace=ws, name =\"Face_recognition_exp\").submit(pipeline, regenerate_outputs=True)\n",
    "RunDetails(pipeline_run).show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [conda env:amlsamples_env]",
   "language": "python",
   "name": "conda-env-amlsamples_env-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
